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Abstract 

The  paper  demonstrates  how  multi-period  portfolio  optimization  problems 
can  be  efficiently  solved  as  multi-stage  stochastic  linear  programs.  A  scheme 
based  on  a  blending  of  classical  Benders  decomposition  techniques  and  a  special 
technique,  called  importance  sampling,  is  used  to  solve  this  general  class  of 
multi-stage  stochastic  linear  programs.  We  discuss  the  case  where  stochastic 
parameters  are  dependent  within  a  period  as  well  as  between  periods.  Initial 
computational  results  are  presented. 
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1.  Introduction 


Methods  of  Operations  Research,  especially  Mathematical  Programming  methods,  are 
receiving  broader  acceptance  in  the  financial  industry.  The  increasing  complexities 
and  inherent  uncertainties  in  financial  markets  have  lead  to  the  need  of  mathematical 
models  supporting  the  decision  making  process.  This  paper  addresses  the  portfolio 
selection  problem.  Since  Markowitz  (1959)  [20],  several  models  have  been  developed 
that  allow  one  to  determine  portfolios  with  the  highest  expected  returns  for  a  given 
level  of  risk.  His  model  (and  certain  closely  related  ones)  require  the  solution  of  a 
quadratic  program.  Other  approaches  model  the  stochastic  nature  of  the  problem 
directly  as  a  stochastic  program.  For  example,  Mulvey  (1987)  [21]  and  Mulvey  and 
Vladimirou  ( 1989)  [22]  [23]  formulate  asset  allocation  problems  as  a  stochastic  network 
l)roblem. 

The  use  of  stochastic  programming  techniques  has  been  hampered  until  recently 
by  the  sheer  size  of  practical  problems  when  they  are  restated  as  deterministic  linear 
problems.  To  solve  them  it  was  necessary  that  the  number  of  scenarios  representing 
uncertainties  be  kept  small.  Most  models  developed  so  far  have  been  single-stage  or 
single- period  models,  that  is  to  say  to  the  case  where  the  decision  making  process  and 
the  future  events  (foresight)  are  restricted  to  a  single  time  period.  Only  few  attempts 
have  been  made  to  solve  practical  multi-stage  decision  making  models  whose  future 
events  are  spread  over  several  periods. 

Multi-stage  planning  problems  can  often  be  formulated  as  linear  programs  with 
a  dynamic  matrix  structure  which,  in  the  deterministic  case,  appear  in  a  staircase 
pattern  of  blocks  with  non-zero  submatrices.  These  blocks  correspond  to  and  are 
different  for  different  time  periods.  In  the  stochastic  case,  the  blocks  of  coefficients 
and  right  hand  sides  in  different  time  periods  are  functions  of  several  parameters 
who.se  values  vary  stochastically  with  dependent  and  independent  distributions  which 
we  assume  to  be  known.  The  resulting  problem  is  a  multi-stage  stochastic  linear 
program.  Even  for  problems  with  a  small  number  of  stochastic  parameters  per  stage 
the  size  of  multi-stage  problems  when  expressed  in  equivalent  deterministic  form  can 
get  so  large  as  to  appear  intractable.  The  simplest  case  and  most  studied  is  that  with 
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two  stages.  Stochastic  linear  programs  were  first  introduced  by  Dantzig  (1955)  [4] 
and  Beale  (1955)  [1].  Since  then  it  has  been  studied  by  many  authors,  some  recent 
references  are  Birge  (1985)  [3],  Ermoliev  (1983)  [10],  Frauendorfer  (1988)  [12],  Higle 
and  Sen  (1989)  [14],  Kail  (1979)  [19],  Pereira  et  al.  (1989)  [25],  Rockafellar  and  Wets 
(1989)  [28],  Ruszczynski(1986)  [29],  and  Wets  (1984)  [31].  See  Ermoliev  and  Wets 
(1988)  [11]  for  a  survey  of  different  ways  proposed  to  solve  the  stochastic  programs. 

A  new  approach  based  on  Benders  decomposition  and  importance  sampling  was 
introduced  by  Dantzig  and  Glynn  (1990)  [5]  and  developed  jointly  by  them  and  In- 
faiiger  (1990)  [17].  Our  approach  turned  out  to  be  very  powerful.  We  demonstrated 
its  power  by  solving  several  practical  large-scale  stochastic  linear  programs  with  nu¬ 
merous  stochastic  parameters.  Infanger  (1991)  [18]  and  Dantzig  and  Infanger  (1991) 
[7]  report  on  computational  results  of  large-scale  problems  with  up  to  52  stochastic 
parameters,  where  the  deterministic  equivalent  problem  if  attempted  to  express  it 
explicitly  would  have  had  several  billions  of  constraints.  These  problems  were  two- 
stage  problems  or  belonged  to  a  restricted  class  of  multi-stage  problems  which  could 
be  reexpressed  in  the  two-stage  framework. 

2.  The  Multi-Period  Asset  Allocation  Problem 

In  this  paper  we  formulate  a  class  of  multi-period  financial  asset  allocation  problems 
(Mulvey  and  Vladimirou  (1989)  [22])  and  show  how  they  can  be  solved  by  adaptations 
of  multi-stage  stochastic  linear  programs  methodology  and  software. 

At  the  initial  time  period  1  a  certain  amount  of  wealth  is  available  to  a  decision 
maker  in  assets  i  =  1, . . .  ,n  and  in  cash  which  we  index  as  asset  n  -|-  1.  We  denote 
i,,  i  =  1, . . . ,  n  +  1  to  be  the  dollar  value  of  the  initially  available  assets.  The  decision 
maker  has  to  decide  each  period  how  to  rearrange  his  portfolio  to  achieve  best  return 
on  his  initial  investment  over  time.  We  consider  the  problem  in  discrete  time  ami 
define  time  steps  f  =  1, . . .  ,7’,  e.g.  by  months,  with  T  being  the  end  of  the  planning 
horizon. 

At  each  time  period  t  the  investor  can  either  hold  on  to  asset  buy  more,  or  .sell 
off  part  (or  all)  of  asset  i.  We  denote  y\  the  amount  sold  of  asset  ?  in  period  t  and  by 
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x\  the  amount  of  asset  i  in  period  t  held  on  to.  Selling  means  decreasing  the  value 
of  asset  i  and  increasing  the  value  of  cash,  Also  the  investor  has  the  choice 

of  using  his  resulting  cash  to  buy  certain  amounts  of  assets  i.  The  amount  bought  in 
period  t  is  denoted  with 

Buying  and  selling  causes  transaction  costs  which  we  assume  to  be  proportional 
to  the  amount  of  dollar  value  of  asset  traded.  We  denote  by  lOOi/,  the  transaction 
costs  (expressed  as  a  percentage)  associated  with  buying  one  unit  of  i  and  with  100/i, 
the  transaction  costs  (expressed  cis  a  percentage)  associated  with  selling  oflF  1  unit  of 
asset  i.  Buying  1  unit  of  asset  i  requires  l+Vi  units  of  cash  and  selling  1  unit  of  asset 
i  results  in  1  —  fii  units  of  cash. 

Through  buying  and  selling  the  investor  can  restructure  his  portfolio  in  each  time 
period  t.  Once  this  t-th  stage  decision  is  made,  the  holdings  x‘,  i  =  1, . . . ,  n  +  1  can 
be  calculated.  The  shares  in  the  portfolio  is  then  kept  constant  till  the  next  time 
period.  The  value  of  xj  is  affected  by  the  returns  on  the  market.  For  example  a 
portfolio  x-  at  time  t  changes  its  value  to  Rjxj  where  Rj  denotes  the  return  factors 
from  period  t  to  period  <  +  1. 

At  time  i,  when  the  decision  on  rearranging  the  portfolio  has  to  be  made,  returns 
Rj,  for  z  =  l,...,n  are  not  known  to  the  decision  maker  with  certainty.  Only  the 
return  on  cash,  R^+i  is  assumed  known.  However,  we  assume  we  know  the  probability 
distributions  of  Rj.  The  problem  is  of  the  “wait-and-see”  type.  While  the  decision  at  t 
has  to  be  made  on  the  basis  of  distributions  of  future  returns  Rj,  for  i  =  1, . . . ,  n,  t  = 
the  values  of  prior  returns  Rj,  z  =  1, . . .  ,n,  t  =  1, . . .  ,t  —  1  have  already 
been  observed.  We  denote  with  R‘  =  Rj,  (or  z  —  1, . . .  ,n  the  n-dimension-'i  random 
vector  with  outcomes  r‘(u;j),  uzt  G  fit,  with  p'^'  the  corresponding  probability  and  f], 
the  set  of  all  possible  outcomes  in  t.  The  random  returns  Rj  of  period  t  are  mutually 
dependent  and  dependent  on  the  random  parameters  of  the  previous  period. 

After  the  last  period  T  no  decision  is  made.  Only  the  '  alue  of  the  portfolio  is 
determined  by  adding  all  values  of  assets  including  the  la.-^t  period  returns.  We  call 
this  value  v^.  The  goal  of  the  decision  maker,  however,  is  to  maximize  Eu(v^),  the 
expected  utility  of  the  value  of  the  portfolio  after  period  T.  The  utility  function 
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u{v^)  describes  the  way  the  investor  views  risk.  If  u{v^)  is  linear,  it  describes  risk 
neutrality,  if  u(i;^)  is  concave,  it  models  risk  averseness.  Nonlinear  utility  functions 
require  non  linear  programming  techniques  for  the  solution  of  the  problem.  Our 
methodology  is  not  restricted  to  linear  problems.  However,  for  the  sake  of  ease  and 
computational  speed  we  approximate  the  nonlinear  function  by  a  piecewise  linear 
function  with  sufficiently  large  number  of  linear  segments. 

In  the  model  presented  here  we  do  not  consider  shortselling  of  assets,  although 
this  feature  could  be  incorporated  easily.  We  also  do  not  consider  borrowing  of  cash, 
which  also  could  be  incorporated  easily.  The  holdings  of  assets,  as  well  as  the  amounts 
of  assets  sold  or  bought  have  to  be  positive.  In  general  there  are  also  lower  (x)  and 
upper  (x)  bounds  on  holdings  as  well  as  on  amounts  of  assets  to  be  sold  (y,  y)  or  to 
be  bought  (i,  2)  which  are  given  by  the  investor  and/or  by  the  market.  E.g.  a  certain 
asset  may  only  be  available  up  to  a  certain  amount  or  an  investor  wants  to  have  a 
certain  asset  with  at  least  a  certain  amount  of  dollar  value  in  the  portfolio.  Therefore 
in  general  we  formulate  x-  <  x-  <  x|,  <  yf  <  yj,  2*  <  zj  <  z\,  where  x*  >  0,  y|  >  0, 

1,'  >  0,  x°  given  for  t  =  1, . . .  ,n  1,  <  =  1, . . . ,  7’. 

We  can  now  state  the  model; 


i  =  1, . . .  ,T,  i 

n  +  1,  rfx°  given: 

-r,  X,  + 

+  y!  - 

=  0, 

=  0, 

T 

V 

=  0, 

max 

Eu(v'^) 

xj  <  X'  <  X-, 


y'  <  y;  s  y;,  i;  s  z;  s 


■'  <t.  2!  <  <  t. 


i  =  1, . . . ,  n,  t 


T 


We  describe  correlation  between  asset  returns  using  a  factor  model.  Using  factors 
is  common  in  the  financial  industry  (e.g,  Perold  (1984)  [27]),  hence  historical  data  of 
various  factors  are  commercially  available.  The  idea  of  the  factor  model  is  to  relate 

the  vector  of  asset  returns  W  =  (/?!,...,/?„)*  to  factors  ~  (V'l . VhY ■  While 

the  number  of  assets,  n  is  large,  e.g.  a  model  should  be  able  to  handle  about  500  to 
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3000  assets,  the  number  of  factors  h  is  comparatively  small.  Factor  models  used  in 
the  financial  industry  typically  involve  no  more  than  20  different  time  series  called 
factors.  The  factor  matrix  F{n  x  h)  relates  to  : 

R‘  =  FV‘ 

The  coefficients  of  the  factor  matrix  are  estimated  using  regression  analyses  on  his¬ 
torical  data.  By  linear  transformations  of  historical  factors  the  transformed  factors 
can  always  be  determined  in  such  a  way  that  the  factors  V*  are  orthogonal.  These 
factors  can  then  be  interpreted  as  independent  random  parameters  assumed  nor¬ 
mally  distributed  or  log  normally  distributed.  Using  the  factor  model  stochcistically 
dependent  returns  can  be  generated  in  the  computer  by  using  these  stochastically 
independent  factors.  We  denote  the  random  factor  by  also  denoted  as  vj, 

with  corresponding  probability  p(v‘),  where  =  prob  =  vj). 

We  also  consider  inter-period  dependency.  For  example  we  may  wish  to  have  a 
higher  probability  of  having  a  high  rate  of  return  in  period  t  if  it  was  high  in  period 
t  —  1  than  if  it  was  low  in  period  t  —  1.  We  can  model  this  inter-period  dependency 
as  a  Markovian  type  process  applied  directly  on  the  factors: 

v‘  =  F  pj,  i  =  I, . . . ,  h 

The  value  of  factor  i  in  period  t  is  the  sum  of  the  value  of  factor  i  in  the  previous 
period  t  —  1  plus  some  independent  random  variation  of  the  factor  in  t,  denoted  by 
7/-.  The  Markovian  type  model  can  be  estimated  based  on  historical  data.  Instead 
of  having  an  additive  effect  as  above  we  may  prefer  to  have  a  multiplicative  effect 
by  applying  the  Markovian  process  directly  to  the  logs  of  the  factors.  We  haven’t 
explored  this  alternative. 

3.  Multi-Stage  Stochastic  Linear  Programs 

As  one  can  now  see  easily,  the  multi-period  asset  model  proposed  fits  exactly  into  the 
framework  of  a  general  class  of  multi-stage  stochastic  linear  programs  with  recourse. 
The  factor  model  for  generating  dependent  returns  and  the  Markovian  process  for 
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inter-period  dependency  define  a  special  class  of  dependencies  between  stochastic 
parameters  which  we  will  exploit  to  solve  the  problem.  Before  doing  so  we  state  the 
general  problem  and  the  methodology  we  have  developed  to  solve  it. 

The  multi-stage  stochastic  linear  program  caji  be  formulated  as  follows: 


min  z  = 


CiX,  £(c2X^  E{cT.xx'1^r_-; . 

sft 

AiXi  =  b\ 

-Bl^xx  -h  A^xf  = 


r^WT  WT-\,-,^2  I 


Ajx^^ . 


67 


ii 


xf 


)  •  .  .  , 


^T-l 


~'^T . ‘*'2 


>  0 


a;(€Q„/  =  2,...,r 

The  problem  is  the  stochastic  extension  of  a  deterministic  dynamic  linear  program. 
While  the  first  stage  parameters  Ci,  /4i,  6i  are  known  to  the  planner  with  certainty, 
the  parameters  of  stages  2, ...  ,7'  are  assumed  known  only  by  their  distribution.  We 
assume  uncertainty  in  the  coefficients  of  the  transition  matrices  t  =  1 , . . . ,  T  and 
the  right  hand  sides  6"',  t  =  and  assume  the  coefficients  of  the  technology 

matrices  At,  t  =  2,  ...,T  and  the  objective  function  coefficients  Cf,  t  =  2, - T  to 

be  known  with  certainty.  The  goal  of  the  planner  is  to  minimize  the  expected  value 
of  present  and  future  costs. 

The  underlying  “wait-and-see”  decision  making-process  is  as  follows;  The  deci¬ 
sion  maker  makes  a  first  stage  decision  Xi  before  observing  any  outcome  of  random 
parameters.  Then  he  waits  until  an  outcome  of  the  second  stage  random  parameters 
gets  realized.  The  second  stage  decision  then  is  made  based  on  the  knowledge  of  the 
realization  u!2  but  without  observing  any  outcome  of  random  parameters  of  stages 
2,...,T,  and  so  forth.  As  the  state  (the  actual  outcome)  is  carried  forward  to  the 
following  period,  the  decision  tree  grows  exponentially  with  the  number  of  stag<'s.  We 
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consider  discrete  distributions  of  random  parameters  with  finite  number  of  outcomes, 
e.g.  ujt  E  Hi ,  Qt  =  { 1 ,  ■  •  • ,  A't } ,  t  =  1 , . . . ,  T.  With  Kt  being  the  number  of  scenarios 
in  period  t,  the  total  number  of  scenarios  for  all  T  stages  is  The  number 

h't  is  expected  to  be  large,  as  it  is  computed  by  the  crossing  of  the  sets  of  possible 
outcomes  of  the  different  random  parameters  within  a  period.  E.g.  the  dimension  of 
the  random  vector  in  period  t  is  ht  and  Clt  contains  kj  elements;  then  Kt  =  Fljii 
For  example,  in  the  asset  allocation  problem,  consider  the  case  of  20  factors,  modeled 
as  random  parameters  with  5  outcomes  each:  the  number  of  scenarios  per  period  is 
5^°  10'“*.  If  there  are  3  periods,  then  the  total  number  of  scenarios  grows  to  10^*. 

The  dimensions  of  an  equivalent  linear  program  of  an  asset  allocation  problem  with 
a  universe  of  about  500  assets  is  approximately  5  •  10^  rows  and  1.5  •  10^’  columns. 
It  is  of  course  impossible  to  write  down  this  linear  program  explicitly. 

It  is  clear  that  the  multi-period  asset  allocation  problem  defined  above  is  a  special 
case  of  the  multi-stage  stochastic  linear  program.  The  correspondence  is  as  follows: 
the  vector  i(  now  denotes  the  vector  of  all  decision  variables  (holdings,  amount  to 
be  bought  and  to  be  sold)  in  period  t.  Uncertainty  occurs  only  in  the  transition 
matrices  Bt  which  contain  in  their  diagonal  the  return  factors  R\.  The  right  hand 
sides  62,  •  •  •  1  ^  are  zero,  as  well  as  the  objective  function  coefficients  C2, . . . ,  cr-i .  We 
now  describe  the  techniques  we  have  developed  to  solve  the  multi-stage  program. 

4.  Benders  Decomposition 

A  description  of  how  Benders  (1962)  [2]  Decomposition  Algorithm  can  be  applied  to 
solve  stochastic  linear  programs  can  be  found  in  Van  Slyke  and  Wets  (1969)  [30],  Birge 
(1985)  [3].  Using  Benders  decomposition  we  decompose  the  problem  into  subproblems 
of  different  stages  t.  In  the  most  general  case  where  there  is  a  dependency  of  stochastic 
parameters  between  stages  the  number  of  subproblems  is  equal  to  the  number  of 
scenarios  in  each  stage  t.  To  distinguish  one  subproblem  from  another,  each  is  indexed 
with  u>t, . . .  ,u;2,  where  ujt  is  the  random  event  in  stage  t  and  u3t-\ , . . .  ,u>2  is  the  path 
of  previous  events  which  gave  rise  to  the  particular  subproblems  in  stage  t. 

For  expository  purposes,  we  assume  initially  the  random  events  that  happen  in 
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one  stage  are  independent  of  those  that  happen  in  the  next  stage.  For  example,  when 
the  probability  of  having  a  high  rate  of  return  in  period  t  is  the  same  for  all  values  of 
rate  of  return  in  period  t  —  1.  In  the  independent  case  scenarios  cjj+i  6  Hj+i  in  period 
<  +  1  are  identical  for  each  scenario  ut  G  fit  in  period  t.  The  history  is  only  carried 
forward  through  optimal  decisions  from  previous  periods.  In  the  special 

class  of  Markovian  dependency  which  we  described  earlier, 

where  tt  represents  a  matrix  of  random  parameters  independent  of  those  in  period 
t  -  1. 

The  idea  of  using  Benders  decomposition  is  to  express  in  each  stage  <,  i  = 
1, . . . ,  T— 1  and  scenario  Ut  the  expected  future  costs  (the  impact  of  stages  <  +  l, . . . ,  T) 
by  a  scalar  dt  and  “cuts”,  necessary  conditions  for  feasibility  and  optimality  which  are 
expressed  only  in  terms  of  the  stage  t  decision  variables  Xf  and  Ot-  Cuts  are  initially 
absent  and  then  sequentially  added  to  the  stage  t  problems.  Each  scenario  subprob¬ 
lem  ujt  in  stage  t  collects  the  information  about  expected  future  costs  by  means  of 
the  cuts. 

The  relation  between  the  stages  and  scenarios  in  the  decomposed  multi-stage 
problem  is  summarized  as  follows: 

Stage  1  problem: 


min  Zi  = 

CiXi 

+ 

0i 

sft 

TTi  : 

AiXi 

= 

bi 

Pi  ■ 

-G[^x, 

+ 

0i 

> 

II 

X, 

9 

0i 

> 

0 

11 

1 

1,  problem: 

min  2"'  = 

ctxr 

+ 

er 

s/t 

nr  : 

Atxr 

= 

br  +  Briit-i 

p‘r‘  ■■ 

-a['xr 

+ 

er 

> 

g[\  /t  =  i . i 

xr 

9 

or 

> 

0 
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Stage  T  problem: 


min  =  ctx^ 

sjt 

:  Atx^^ 

JL'p 

IV 

o 

min  zi  rep  esents  the  optimal  objective  function  value  in  the  first  stage.  Xj,  6^ 
represent  the  optimal  solution,  the  vector  tTi  denotes  the  optimal  dual  prices  associ¬ 
ated  to  the  original  stage  1  constraints,  and  the  scalars  pj’  are  the  optimal  dual  prices 
associated  to  the  cuts,  which  have  been  added  so  far  in  iterations  /j  = 

The  optimal  objective  function  values  min  z‘f‘  =  min  2"'(x<_]),  and  the  optimal 
dual  prices  tt"'  =  associated  to  the  original  stage  t  constraints  in  stages 

t,  t  =  2, . . .  ,T  and  the  optimal  dual  prices  associated  to  the  cuts 

in  stages  f,  t  =  2, . . .  ,T  —  I  are  all  dependent  upon  the  optimal  solution  passed 
as  input  from  the  previous  stages  t  —  l.  According  to  the  scenario  development  in  the 
previous  stages  an  optimal  solution  Xt~i  is  actually  indexed  by  the  scenario  outcomes 

of  all  previous  stages  and  is  therefore  denoted  as  . For  the  sake  of  exposition, 

we  suppress  the  scenario  history  and  present  the  optimal  solution  of  subproblems  in 
stage  f,  scenario  as  a  function  of  the  input  Xt-\- 

We  compute  the  expected  future  costs  as  Z(+i  =  the  right  hand  sides 

of  the  cuts  as  gj'  =  Fi'u/c+i  (’'"r-hf  +II^+V=i  Pt+i  the  coefficients  of  the 

cuts  as  Gi‘  =  where  =  0,  Gr^  =  0,  and  =  0. 

A  subproblem  in  stage  t  and  in  scenario  u>t  interacts  with  its  predecessors  and 
descendants  by  passing  forward  optimal  solutions  and  backv\.i,rds  cuts.  Benders  de¬ 
composition  splits  the  multi-stage  problem  into  a  series  of  two-stage  relations  which 
are  overall  connected  by  a  nesting  scheme.  We  call  the  stage  t,  scenario  u;t  problem 
the  current  master  problem.  It  receives  from  its  ancestor  in  period  t  —  1  a  solution 
X(_i.  The  current  scenario  is  determined  by  the  outcome  uJt  of  the  random  param¬ 
eters  in  stage  t  which  are  reflected  in  the  right  hand  side  6"'  -f  BfliXt-i-  As  stated 
above,  x,_i  has  a  history.  The  history  has  to  be  considered  when  nesting  several 
stages.  Given  and  subject  to  X£_i  we  solve  the  stage  t  problem  in  scenario  and 
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pass  the  obtained  solution  r"'  to  t*‘c  descendant  problems.  By  solving  all  problems 
G  (referred  to  as  the  universe  case)  we  compute  the  expected  value  of  the 
descendant  stage  costs  Zj+i  =  coefficients  Gt  =  i'wt+iTr^+V^f+i* 

and  the  right  hand  side  +  EJ;V=i  )  of  ^  cut.  The 

cut  is  added  to  the  current  master  problem  (stage  t,  scenario  Ut  problem)  and  by 
solving  the  problem  again  another  trial  solution  is  obtained. 

The  optimal  solution  the  of  current  master  problem  in  stage  t,  scenario  u^t  gives 
a  lower  bound,  and  the  expected  cost  of  the  trial  solution  gives  an  upper  bound  of 
the  expected  costs  of  all  scenarios  descendant  from  the  stage  t  scenario  ut-  If  lower 
bound  and  upper  bound  are  sufficiently  close,  the  current  master  problem  is  said 
to  represent  the  future  expected  cost  and  contains  (by  means  of  a  sufficient  number 
of  cuts)  all  the  information  needed  from  future  scenarios.  In  this  case  we  say  the 
current  master  is  balanced  with  its  descendant  problems. 

Note  that  the  current  master  problem  represents  the  expected  future  costs  only 
subject  to  the  trial  solution  Xt-i  which  was  passed  from  its  ancestor  and  subject  to  the 
current  scenario  ujf  Note  also  that  we  have  implicitly  assumed  that  the  descendant 
problems  in  stage  t  +  1  are  also  balanced  with  their  descendant  problems  in  stage  t  +  2 
by  means  of  having  collected  a  sufficient  number  of  cuts  to  represent  the  expected 
costs  of  descendant  scenarios  for  t+2  on,  and  so  forth.  However,  note  that  the  solution 
of  the  current  stage  t  scenario  Uf  problem  gives  a  lower  bound  of  the  expected  costs 
of  all  scenarios  descendant  from  the  stage  t  scenario  Ut  problem  regardless  of  having 
collected  a  sufficient  number  of  cuts.  We  shall  exploit  this  fact. 

Two  properties  of  cuts  are  crucial  for  the  solution  procedure: 

1.  In  the  case  of  independence  of  stochastic  parameters  between  stages: 
The  cuts  derived  from  any  trial  solution  i"'  are  valid  cuts  for  all  subproblems  u,’,  G 
E.g.  the  cut:  6t  >  Xt  +  is  a  constraint  whose 

coefficients  don’t  depend  on  Xt,  hence  is  valid  for  all  values  of  X(.  To  se(’  this,  note 

optimal  dual  prices  that  do  depend  on  Xi  for  optimality  hut 
they  remain  dual  feasible  independent  of  the  values  of  the  right  hand  side  as  a 
function  of  X(.  The  validity  of  the  cuts  depends  only  on  the  dual  feasibility  of  tlu' 
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.  It  represents  an  outer  linearization  of  the  future  expected  cost  function 
evaluated  at  if  Different  scenarios  Ut  are  in  stage  t  are  distinguished  by  different 
right  hand  sides  of  the  original  stage  t  constraints,  e.g.  AtXt  =  The 

set  of  cuts  —GtXt  +  Ot  >  Qt  Gt  —  represent  an  outer  linearization  of  the 

expected  future  costs  independent  of  scenarios  Ut  €  Dj.  The  outer  linearization 
defined  by  the  set  of  cuts  equals  the  expected  future  cost  function,  if  Ez^[\^{xt)  =  Of 
where  Ot  is  the  value  of  Ot  corresponding  to  the  solution  it  of  any  stage  t  problem. 
If  Ez'^l\'{if')  =  ut  €  (It,  then  a  sufficient  number  of  necessary  cuts  have  been 
generated  to  represent  the  expected  future  costs  for  all  solutions  x'^‘  of  scenarios 
vt',  G  in  stage  t  and  we  say  stage  t  is  balanced  with  stage  t  +  1. 

2.  In  the  case  of  dependency  of  stochastic  parameters  between  stages: 
(  uts  now  depend  on  scenario  Ut  in  period  t.  Sharing  of  cuts  between  different  scenario 
subproblems  G  D,  is  no  longer  directly  possible.  However,  for  additive  dependency, 
(e.g.  Markovian  type  dependency)  cuts  can  be  easily  adjusted  to  different  scenarios. 
(See  Pereira  and  Pinto  (1989)  [26]  for  additive  dependent  right  hand  sides.)  For 
example  in  the  case  of  the  Markovian  type  dependency  which  we  introduced  in  the 
multi-period  asset  allocation  problem  Here  represents  a 

matrix  whose  elements  are  functions  of  random  parameters  which  are  independent  of 
the  period  t  —  1  random  parameters.  (The  elements  of  e  are  the  independent  part  of  the 
random  returns  and  are  generated  by  the  product  F  r]t  where  T]t  is  the  random  change 
in  V',  that  generated  tf)  In  the  case  of  the  additive  dependency  a  cut  in  stage  t  and 
scenario  u;(  has  the  form:  Ot  >  +  . 

It  can  be  easily  seen  that  the  coefficients  of  the  cut  consist  of  a  part  independent  of 
scenarios  u;,  and  a  dependent  part.  The  ;ut  can  be  adjusted  to  different  scenarios 
ujt  G  Hf  by  adding  the  scenario  dependent  part  according  to  scenario 

u>t.  This  requires  storing  of  the  expected  value  of  the  dual  variables  ’’'(+1' • 

I'aking  advantage  of  the  above  stated  properties  we  actually  only  need  to  store 
one  subproblem  per  stage  t.  For  different  scenarios  u>t  and  different  solutions  it-\ 
passed  from  the  previous  stage  we  determine  the  right  hand  side  accordingly.  The 
cuts  are  valid  for  all  scenarios  u)t  G  (It  in  the  case  of  independence  of  the  stochastic 
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parameters  between  stages  or  are  adjusted  in  the  gradient  according  to  the  actual 
scenario  ujt  in  case  of  Markovian  type  dependency  between  stages.  Therefore  it  is 
easily  possible  to  generate  any  ujt  subproblem.  Future  information  is  represented  in 
the  cuts  which  have  been  generated  so  far  and  can  be  efficiently  used  in  any  scenario 
u)t  G  fit  independently  from  which  scenario  originated  it. 

5.  Multidimensional  Integration 

The  computation  of  the  expected  future  costs  2t+i,  the  coefficients  Gt  and  the  right 
hand  side  gt  of  the  cuts  requires  the  computation  of  multiple  integrals  or  multiple 
sums.  The  expected  value  of  the  second  stage  costs  in  period  t  +  1  (we  suppress  the 
index  t  for  this  discussion),  e.g.  z  =  Ez‘^  =  E{C)  is  an  expectation  of  functions 
C{v'^),u!  £  n,  where  C{v'^)  is  obtained  by  solving  a  linear  program.  V  (in  general)  is 
a  /i-dimensional  random  vector  parameter,  e.g.  V  =  (Vj, . . . ,  V/,),  with  outcomes  u"  = 
(ui, . . .  ,U/,)".  For  example  Vi  represents  the  value  of  the  z-th  factor  u"  the  observed 
random  outcome.  The  vector  u"  is  also  denoted  by  u,  and  p(u“')  alias  p{v)  denotes  the 
corresponding  probability,  fl  is  the  set  of  all  possible  random  events  and  is  constructed 
by  crossing  the  sets  of  outcomes  fl  =  fli  x  flj  x  •  •  ■  x  fl/»-  With  P  being  the  probability 
measure  under  the  assumption  of  independence  the  integral  E  C{V)  =  /  C(v'^)P{du;) 
takes  the  form  of  a  multiple  integral  E  C{V)  =  f  ■  ■  ■  f  C(v)p(v)dvi  ■  ■  ■  dl’^^,  or.  in  case 
of  discrete  distributions,  the  form  of  a  multiple  sum  E  C[V)  =  ’  ’  ‘  Ziih 

where  p(v)  =  Pi(vi)  ■  ■  ■  ph(vh). 

The  number  of  terms  in  the  multiple  sum  computation  gets  astronomically  large 
and  therefore  the  evaluations  of  multiple  sums  by  direct  summation  is  not  practical. 
This  is  especially  true  because  function  evaluations  are  computationally  expensive 
since  the  evaluation  of  each  term  in  the  multiple  sum  requires  the  solution  of  a  linear 
[zrogram.  In  the  following  we  discuss  a  scheme  for  estimating  the  expected  values 
with  a  sufficiently  low  estimation  error  without  having  to  evaluate  each  term. 
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6.  Importance  Sampling 


Monte  Carlo  Methods  are  recommended  to  compute  multiple  integrals  or  multiple 
sums  for  higher  /j-dimensional  sample  spaces  (Davis  and  Rabinowitz  (1984)  [9]  ,  Glynn 
and  Iglehart  (1989)  [13]).  Suppose  G'*'  =  C{v'^)  are  independent  random  variates 
of  u!  =  1,...,7J  with  expectation  where  n  is  the  sample  size.  An  unbiased 
estimator  of  c  with  variance  (t|  =  =  var{C{V))  is 


=  (\ln)'£C". 


u;=l 


Note  that  the  standard  error  decreases  with  ®  and  the  convergence  rate  of  r  to  2  is 
independent  of  the  dimension  of  the  sample  space  h.  We  rewrite  r  = 
as 


E 


C{v'^)p{v'^)q{v'^) 

qiV-) 


by  introducing  a  new  probability  mass  function  q{v'^)  and  we  obtain  a  new  estimator 
of 

J  ^  i  ^  C(o“)p(u“) 


n 


u/=l 


by  sampling  from  qiv'^).  The  variance  of  2  is  given  by 


1  ^  /^C(u-)p(u-)  V 

var{z)  =  -  K  - 9  '• 


Choosing  o'fn*^)  =  ^  would  lead  to  var{z)  =  0,  which  means  one  could 

get  a  perfect  estimate  of  the  multiple  sum  from  only  one  estimation.  Practically, 
however,  this  is  useless  since  to  compute  <7(v'^)  we  have  to  know  2  =  Hwen 
which  is  what  we  are  trying  to  compute  in  the  first  place. 

The  result,  however,  helps  to  derive  a  heuristic  for  choosing  q.  It  should  be 
proportional  to  the  product  C{v^)p{v^)  and  should  have  a  form  that  can  be  integrated 
easily.  Thus  a  function  r(i;‘*')  w  C{v'^)  is  sought,  which  can  be  integrated  with  less 
effort  than  C(n“').  Additive  and  multiplicative  (in  the  components  of  the  stochastic 
vector  u)  approximation  functions  and  combinations  of  these  are  potential  candidates 
for  our  approximations.  Especially  for  financial  investment  problems,  we  have  been 
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getting  good  results  using  C{V)  «  Y!,i=i  Ci{Vi).  We  compute  q  as 


q{v'^) 


In  this  case  one  has  to  compute  only  h  I'dimensional  sums  instead  of  1  fe-dimensional 
sum.  The  variance  reduction  depends  on  how  well  the  approximation  function  fits 
the  original  cost  function.  If  the  original  cost  function  has  the  property  of  additivity 
(separability)  the  multiple  sum  can  be  computed  exactly  by  h  1-dimensional  sums.  If 
the  additive  model  is  a  bad  approximation  of  the  cost  function  the  only  “price”  that 
has  to  be  paid  is  increasing  the  sample  size.  If  the  observed  variance  is  too  high  using 
a  starting  sample  size,  the  sample  size  is  adjusted  higher.  Actually  we  use  a  variant  of 
the  additive  approximation  function.  By  introducing  C{t),  the  costs  of  a  base  case, 
we  make  the  model  more  sensitive  to  the  impact  of  the  stochastic  parameters  v. 

h 


r(I/)  =  C(t)  +  ^  r.( K),  r,{V,)  =  CCn, . . .  K,  r,+, . rO  -  C(t) 

1=1 

We  denote  this  as  a  marginal  cost  model,  r  can  be  any  arbitrary  chosen  point  of  the 
set  of  values  u,,  i  =  1, . . .  ,h.  For  example  we  choose  as  that  outcome  of  V,  which 
leads  to  the  lowest  costs,  ceteris  paribus. 

Summarizing,  the  importance  sampling  scheme  has  two  phases:  the  preparation 
phase  and  the  sample  phase.  In  the  preparation  phase  we  explore  the  cost  function 
C{V}  at  the  margins  to  compute  the  additive  approximation  function  r(V').  For  this 
process  riprep  =  1  +  ~  1)  subproblems  have  to  be  solved.  Using  r(U)  we 

compute  the  approximate  importance  density 


,  _ _ r(i>“)p(p") _ 

Next  we  sample  n  scenarios  from  the  importance  density  and,  in  the  sample  phase, 
solve  n  linear  programs  to  compute  the  estimation  of  z  using  the  Monte  Carlo  esti¬ 
mator.  We  compute  the  gradient  G  and  the  right  hand  side  g  of  the  cut  using  the 
same  sample  points  at  hand  from  the  expected  cost  calculation.  See  Infanger  (1991) 
[18]  for  the  computation  of  the  cuts  and  details  of  the  estimation  process. 
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7.  The  Algorithm 

liy  solving  a  sample  of  subproblems  Wj+i  according  to  the  importance  sampling  scheme 
we  compute  estimates  of  the  expected  future  costs  the  gradients  and  the 
right  hand  sides  gl‘  of  the  cuts  in  each  stage  t  and  scenario  Uf  The  objective  function 
value  of  the  solution  of  each  stage  t,  scenario  w,  subproblem  gives  a  valid  lower-bound 
estimate  of  the  expected  costs  =  Cfi"'  subject  to  scenario  Ut  and  subject  to 

X(_i,  the  (optimal)  solution  passed  forward  from  the  previous  stage.  The  obtained 
lower-bound  estimate  is  the  tightest  lower  bound  that  can  be  generated,  if  in  stage 
t-\-l  a  sufficient  number  of  cuts  have  been  added  to  represent  the  expected  future  costs 
with  respect  to  stage  <  -I-  1  for  all  scenarios  6  and  is  a  weaker  lower-bound 
estimate  if  there  is  not  a  sufficient  number  of  cuts. 

We  are  especially  interested  in  the  lower-bound  estimate  of  the  first  stage  costs 
which  we  obtain  by  solving  the  first  stage  problem.  If  the  first  stage  problem  is 
balanced  with  the  second  stage,  that  is,  if  the  cuts  added  so  far  to  the  first  stage 
problem  fully  represent  the  expected  second  stage  costs,  and  if  the  second  stage  is 
balanced  with  the  third  stage  for  all  scenarios  0^2  G  ^2  and  all  values  of  Xj,  passed 
to  it  from  the  first  stage,  and  so  forth  till  stage  T  —  1,  then  the  solution  of  the  first 
stage  problem  is  the  optimum  solution  of  the  multi-stage  stochastic  linear  program. 
In  this  case  the  lower  bound  estimate  of  zj  takes  on  the  value  of  the  total  expected 
costs  of  the  multi-stage  problem. 

To  obtain  an  upper  bound  of  the  total  expected  costs  of  the  multi-stage  problem, 
we  evaluate  the  expected  costs  of  the  current  first  stage  trial  solution  Xi.  This  can  be 
accomplished  by  sampling  paths  from  stages  2, . . . ,  T.  For  a  reference,  see  Pereira  and 
Pinto  (1989)  [26].  To  efficiently  sample  a  small  number  of  paths  to  obtain  an  accurate 
estimate  of  the  expected  costs  associated  with  Xi,  we  also  use  importance  sampling. 
We  define  a  path  s"  =  (xi,  12,  ■  •  • ,  xj)",  u;  G  H,  where  fl  =  {fl2  x  fla  x  •  •  •  x  flxli 
a.s  a  sequence  of  optimal  solutions  x"‘  of  stage  t  scenario  Wf  problems,  t  =  2, . . . ,  T 
and  X]  being  the  first  stage  trial  solution.  A  path  is  computed  by  observing  the 
“wait-and-see”  requirements:  We  pciss  xi  to  the  second  stage  and  solve  the  second 
stage  problem  for  scenario  UJ2  and  obtain  the  optimal  solution  x^.  Next  we  pass  the 


16 


obtained  second  stage  solution  x'^  to  the  third  stage  and  solve  the  third  stage  problem 
for  scenario  to  obtain  .  We  continue  in  this  way  until  we  obtain  in  stage 
T.  Note  that  when  solving  the  stage  t  problem  no  future  outcomes  ujt+i,  ■  ■  ■  are 
used.  All  future  information  at  eaeh  stage  is  solely  represented  by  means  of  the  cuts 
added  in  stage  t  so  far.  The  costs  of  a  path  C'(i")  is  given  by  C(i")  =  YlJ=i  CtXt‘. 
The  expected  value  of  the  costs  of  all  paths  J",  u;  6  gives  an  upper  bound  to 

the  costs  of  a  trial  solution  xi. 

We  sample  paths  by  applying  the  importance  sampling  scheme  to  the  dimensional 
space  of  size  random  parameters  it  =  t  =  2,...,T. 

For  sampling  paths  the  importance  density  ^(V')  is  computed  based  on  the  additive 
marginal  approximation  function  analogous  to  the  way  it  wais  defined  earlier: 

T  h 

r(V')  =  c(T)  +  5:^c(T,.„...,T,,„_.,v,  ,tt )  +  l  >  •  •  ■  t  '^T,hx  )  ^  ) 

«=1  f,  =  l 


where  V  =  ( •  •  • ,  . . . ,  14^)  and  t  =  (t/,  . . .  r*, . . . ,  .  Sampling 

paths  uj  €  according  to  this  importance  sampling  scheme  we  obtain  an  equal 
number  of  sample  points  Ut  €  fit  in  stages  t  =  2,. . .  ,T.  At  these  sample  points  we 
define  the  current  stage  t  scenario  u)t  subproblems  and  generate  cuts  to  be  added  at 
stages  t  =  1,...,7’  —  1  by  employing  importance  sampling  as  described  above  for 


The  overall  procedure  works  as  follows:  Solving  the  stage  1  problem  in  iteration 
1  we  obtain  a  trial  solution  Xj  and  a  lower  bound  estimate  of  the  expected  costs  rj. 
Now  we  employ  the  path  sampling  procedure  to  obtain  an  upper  bound  estimate  of 
the  expected  costs  z^.  If  the  upper  bound  estimate  and  the  lower  bound  estimate 
are  within  a  given  optimality  tolerance,  we  call  the  first  stage  solution  the  optimal 
solution  of  the  multi-stage  problem,  and  quit.  Otherwise  we  generate  cuts  in  stages 
I,. . .  ,T  —  1.  The  path  sampling  procedure  used  for  the  upper  bound  estimate  has 
produced  sample  points  Wj  €  fit  in  stages  t  =  2,...,T  with  corresponding  ancestor 
solutions  Xi  and  x"'  in  stages  t  =  2,  ...,T  —  1  to  be  passed  to  the  current  stage 
t  scenario  uJt  problem.  Starting  at  stage  T  —  1  and  moving  backwards  till  stage  1 
we  take  each  sample  problem  Wt  in  stage  t  and  finally  the  stage  1  problem  as  th(' 
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current  master  problem  and  compute  cuts  by  sampling  again  Wj+j  G  flf+i  descendant 
problems  until  each  scenario  problem  Wj  in  stage  t  is  balanced  with  stage  <  +  1  with 
regard  to  ancestor  solutions  i«_i  which  have  been  passed  from  stage  <  —  1.  Arriving 
at  stage  1  we  obtain  a  new  solution  xi  and  a  new  lower  bound  estimate.  We  continue 
as  defined  above  by  sampling  new  paths  for  the  upper  bound  estimate.  Finally,  after 
a  finite  number  of  iterations,  upper  and  lower  bound  estimates  will  be  sufficiently 
close.  Upper  and  lower  bound  estimates  can  be  seen  as  the  sum  of  i.i.d.  random 
terms  which  for  sample  sizes  of  30  or  more  can  be  assumed  normally  distributed  with 
known  (derived  from  the  sampling  process)  variances.  A  95%  confidence  interval  of 
the  obtained  solution  is  computed. 

8.  Computational  Experience 

Computational  results  of  using  Benders  decomposition  and  importance  sampling  for 
two-stage  asset  allocation  problems  can  be  found  in  Infanger  (1991)  [18]  and  Dantzig 
and  Infanger  (1991)  [7]  where  we  report  on  the  solution  of  test  problems  with  up 
to  52  stochastic  parameters  and  a  number  of  universe  scenarios  of  more  than  10^“'. 
These  problems  were  formulated  as  two-stage  stochastic  programs.  Using  importance 
sampling  and  sample  sizes  between  200  and  600  very  accurate  results  were  obtained, 
e.g.  the  estimated  95%  confidence  interval  was  less  than  0.8%  on  each  side  based  on 
the  optimal  objective  function  value.  Additional  tests  on  these  examples  showed  that 
the  ratio  of  variance  reduction  obtained  by  using  importance  sampling  versus  crude 
(naive)  Monte  Carlo  sampling  was  about  10“®. 

Inspired  by  these  results  we  implemented  an  earlier  version  of  the  methodology 
described  above  for  the  multi-stage  case  which  did  not  consider  dependency  between 
stages.  Instead  of  the  path  sampling  procedure  for  obtaining  upper  bounds  we  imple¬ 
mented  a  procedure  where  we  sampled  points  rather  than  paths  which  requested  the 
handling  of  an  exponentially  expanding  decision  tree.  Therefore  even  when  we  used 
very  small  sample  sizes,  the  number  of  stages  that  was  practical  to  solve  was  limited. 

We  did  test  up  to  3-stage  problems.  FIS  is  a  3-stage  test  problem  derived  from  a 
2-stage  financial  portfolio  problem  found  in  Mulvey  and  Vladimirou  (1989)  [22].  The 


problem  is  to  select  a  portfolio  which  maximizes  expected  returns  in  future  periods 
taking  into  account  the  possibility  of  revising  the  portfolio  in  each  period.  There  are 
transaction  costs  and  bounds  on  the  holdings  and  turnovers.  Our  test-problem  covers 
a  planning  horizon  of  3  periods  whereas  the  original  Mulvey-Vladimirou  test-problem 
wcis  a  2-stage  problem  which  compressed  all  future  periods  into  a  single  second  stage. 
They  solved  the  stochastic  problem  by  restricting  the  number  of  scenarios  in  fi. 

We  assumed  the  returns  of  the  stocks  in  the  future  periods  to  be  independent 
stochastic  parameters  with  3  outcomes  each.  With  13  assets  with  uncertain  returns, 
the  problem  had  26  stochastic  parameters  instead  of  39  because  after  the  Icist  stage 
decision  was  made,  the  expected  money-value  of  the  portfolio  can  be  evaluated.  The 
number  of  universe  scenarios  Wcis  2.5- 10*^.  (The  deterministic  equivalent  formulation 
of  the  problem  has  more  than  lO'"*  rows  and  a  similar  number  of  columns.)  We 
obtained  an  estimated  optimal  solution  of  the  3-stage  stochastic  problem  using  a 
sample  size  of  only  50  per  stage.  The  optimal  objective  function  value  was  estimated 
to  be  1 .10895  with  an  estimated  95%  confidence  interval  of  0.004%  on  the  left  side  and 
0.001%  on  the  right  side  of  the  obtained  objective  function  value.  Thus  the  optimal 
objective  value  lies  within  1.10881  <  z*  <  1.10895  with  95%  probability.  Note  how 
small  the  confidence  interval  is. 

9.  Conclusion 

We  have  demonstrated  how  real-world  multi-period  asset  allocation  problems  can 
be  efficiently  solved  as  multi-stage  stochastic  linear  programs  using  our  approach  of 
combining  Benders  decomposition  and  importance  sampling.  The  numerical  results 
obtained  so  far  are  very  promising:  We  obtained  very  accurate  solutions  for  a  3-stage 
asset  allocation  test-problem  using  remarkably  small  sample  sizes. 
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